Search CORE

209 research outputs found

Towards Benchmarking Multi-Model Databases

Author: Lu Jiaheng
Publication venue
Publication date: 01/01/2017
Field of study

Multi-model databaseNon peer reviewe

Helsingin yliopiston digitaalinen arkisto

Jiaheng Joplin Lu, Viola

Author: Lu Jiaheng Joplin
Publication venue: SMU Scholar
Publication date: 15/11/2016
Field of study

Cello Suite No. 4 in E-flat Major, BWV 1010 / J.S. Bach; Louange à l\u27Éternité de Jésus / Olivier Messiaen; Theme and Variations for viola & piano / Alan Schulma

Southern Methodist University

Jiaheng Joplin Lu, Viola

Author: Lu Jiaheng Joplin
Publication venue: SMU Scholar
Publication date: 11/04/2017
Field of study

Cello Suite No. 3, Prelude / J.S. Bach; Concerto for viola and orchestra / Béla Bartók; Audition Excerpts; Elegiac Trio / Arnold Ba

Southern Methodist University

Jiaheng Joplin Lu, Viola

Author: Lu Jiaheng Joplin
Publication venue: SMU Scholar
Publication date: 11/04/2016
Field of study

Sonata Op. 120 No. 2 / Johannes Brahms; Le tombeau de Ravel / Arthur Benjami

Southern Methodist University

Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

Author: Lu Jiaheng
Shi Juwei
Publication venue: IEEE
Publication date: 01/01/2021
Field of study

Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of pre-emptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Multi-model Data Management : What's New and What's Next?

Author: Holubová Irena
Lu Jiaheng
Publication venue
Publication date: 01/01/2017
Field of study

TutorialAs more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues. In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work. The slides and more materials of this tutorial can be found at http://udbms.cs.helsinki.fi/?tutorials/edbt2017.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

MORTAL: A Tool of Automatically Designing Relational Storage Schemas for Multi-model Data through Reinforcement Learning

Author: Lu Jiaheng
Yuan Gongsheng
Publication venue: CEUR-WS.org
Publication date: 01/01/2021
Field of study

Considering relational databases having powerful capabilities in handling security, user authentication, query optimization, etc., several commercial and academic frameworks reuse relational databases to store and query semi-structured data (e.g., XML, JSON) or graph data (e.g., RDF, property graph). However, these works concentrate on managing one of the above data models with RDBMSs. That is, it does not exploit the underlying tools to automatically generate the relational schema for storing multi-model data. In this demonstration, we present a novel reinforcement learning-based tool called MORTAL. Specifically, given multi-model data containing different data models and a set of queries, it could automatically design a relational schema to store these data while having a great query performance. To demonstrate it clearly, we are centered around the following modules: generating initial state based on loaded multi-model data, influencing learning process by setting parameters, controlling generated relational schema through providing semantic constraints, improving the query performance of relational schema by specifying queries, and a highly interactive interface for showing query performance and storage consumption when users adjust the generated relational schema.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Worst Case Optimal Joins on Relational and XML data

Author: Al-Khalifa Shurug
Lu Jiaheng
Lu Jiaheng
Publication venue
Publication date: 10/06/2018
Field of study

In recent data management ecosystem, one of the greatest challenges is the data variety. Data varies in multiple formats such as relational and (semi-)structured data. Traditional database handles a single type of data format and thus its ability to deal with different types of data formats is limited. To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm. The salient feature of our algorithm is that it can guarantee that the intermediate results are no larger than the worst-case join results. Preliminary results show that our multi-model algorithm significantly outperforms the baseline join methods in terms of running time and intermediate result size.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto